Skip to content

Performance issues scanning large accounts #137

Open
@rdegraaf

Description

@rdegraaf

Describe the bug

PMapper takes excessive time to process data pulled from some accounts with many resources. This implies that it's using a very inefficient algorithm at some point.

For instance, when I scanned an account containing 369 CloudFormation Stacks in the same region, PMapper made 4 calls to cloudformation:DescribeStacks (because of pagination limits) ending at 10:27:24, then spent the next six minutes processing before emitting any results. From the debug log:

2023-10-10 10:27:54-0700 | INFO | principalmapper.graphing.cloudformation_edges | Generating Edges based on data from CloudFormation.
2023-10-10 10:33:48-0700 | INFO | principalmapper.graphing.cloudformation_edges | Found new edge: role/ can create a stack in CloudFormation to access role/

Next up was CodeBuild. That didn't take long because the account doesn't use CodeBuild. Then came Lambda. The account has 141 Functions with 129 of them in the same region. It finished pulling data and started processing at 10:40:19:

2023-10-10 10:40:19-0700 | DEBUG | principalmapper.graphing.lambda_edges | Identified 141 Lambda functions for processing

I gave up waiting and killed the process 4 hours later since my AWS creds would have expired by then even if PMapper did eventually progress to its next service.

During this time, the Python process running PMapper was using approximately 100% of one CPU core. The process' memory use was unremarkable, only 0.7% of available memory as per top. It was not waiting for something on the network and wasn't making repetitive calls to AWS; I attached a proxy and confirmed that no further requests were made after 10:40:19. "--debug" logging did not do anything to reveal the problem. I tried running it under cProfile, which of course made everything orders of magnitude slower; it took 40 minutes to process AutoScaling data as opposed to the 21 second that it took to do the same without cProfile:

2023-10-10 11:14:13-0700 | DEBUG | principalmapper.graphing.autoscaling_edges | Looking at region us-west-2
2023-10-10 11:54:29-0700 | INFO | principalmapper.graphing.autoscaling_edges | Found new edge: role/ can use the EC2 Auto Scaling service role and create a launch configuration to access role/

I killed the profiled process after >3 hours while it was still working on CloudFormation; see pmapper-cprofile.txt for its output. Based on a quick scan of the results, it looks like _compose_pattern() in local_policy_simulation.py is a major bottleneck: the program spent about 170 minutes in there. Can you optimize that at all? Maybe find out if you're repeatedly compiling the same patterns and cache the results somehow?

It is possible that this issue is related to #115, though I'm not getting any permission errors.

To Reproduce

Sorry, I can't give precise details on the account because it isn't my account. By the time that someone investigates this issue, I most likely will no longer have access to the account. I don't know if the problem has to do with the number of resources or something about how they are configured.

Expected behavior

PMapper should run in a reasonable amount of time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions